Routing marine mammal (and other marine taxa) telemetry tracks around water has been an issue on the wish list of bio-loggers for many years now. A number of approaches have been pursued over the years. But, many have either required censoring/cherry-picking of data OR extensive computational time. Here, we rely on routing technologies that many of us use daily when we query our phones for directions from point A to point B. In brief, a network of nodes and edges are created that represent possible paths through water (and, around land) within the study area. Creation of this feature can take many hours. However, once the network is created, it does not need to be repeated for that study area. This graph is then used to determine the shortest path between two points through the network. This step is fast so hundreds of tracks can be re-routed around land in just a few minutes. To our knowledge, this network/graph approach has not been applied to the movement of marine animals. The approach we outline here (and implement within the ‘pathroutr’ package) is intended to be used in conjunction with a predicted track output from a movement model (e.g. from the R package crawl). To reduce computational time, we only focus on fixing the portions of the predicted track that cross land.
The package does rely on the availability of a PostgreSQL databse with both the PostGIS and pgRouting extensions installed. This database could be running locally or within your enterprise/academic IT infrastructure, or on a cloud service. While detailed knowledge of PostgreSQl, PostGIS, and pgRouting are not a requirement, the large size and computational requirements of the network means you will likely benefit from some investment in optimization of the server configuration (i.e., you might need a lot of memory! and more than a few CPUs) if your study are is large or includes complicated land structures.
library(tidyverse)
library(crawl)
library(ptolemy)
library(sf)
library(tictoc)
library(mapview)
library(RPostgres)
library(here)
This approach relies on the pgRouting extension to PostgreSQL. You’ll need to have access to a properly configured database server. This initial example is based on a local installation on macOS. The easiest option on macOS is to install with homebrew
brew install postgresql
brew install postgis
brew install pgrouting
At this point, you’ll need to do some additional setup on your macOS machine to get postgres up and running. If you manage homebrew with the same user account as you do all your R work, then you can skip this first part (you can just run initdb and your database cluster will be created in the default location). Otherwise, you need to make sure the postgres database is initiated into a directory that you have full read/write access to.
mkdir ~/postgres
initdb -D /Users/$(whoami)/postgres
pg_ctl -D /Users/$(whoami)/postgres -l /Users/$(whoami)/logfile start
createdb
pg_ctl -D /Users/$(whoami)/postgres -l /Users/$(whoami)/logfile stop
The above steps initiated the database cluster and then calls pg_ctl to start the cluster. Then, createdb creates a database (the name of the database defaults to $(whoami), but you may want to give your database a name that is relevant to your study area). The last statement stops the server.
To make things a bit easier on ourselves, we’ll create a few aliases in our .bash_profile for the start and stop commands. Open up .bash_profile in an editor and add the following lines
alias pg_start='pg_ctl -D /Users/$(whoami)/postgres -l /Users/$(whoami)/logfile start'
alias pg_stop='pg_ctl -D /Users/$(whoami)/postgres -l /Users/$(whoami)/logfile stop'
Lastly, let’s start up our posgres server by typing pg_start in the terminal. Note, this will not start your server automatically after a restart. There are ways to make this an option and a few minutes searching the internet will likely guide you to success.
Now, you need to install the PostGIS and pgRouting extensions into your postgres database. We can do this from the command line. If you provided a name for your database, you’ll want to include it after the psql. This will not work if your brew install steps above were not successful.
psql
CREATE EXTENSION postgis;
CREATE EXTENSION pgrouting;
SELECT PostGIS_Version();
SELECT pgr_version();
exit
It is also a good idea to change some of the default configurations for your postgres/postgis server to optimize performance for some of the complicated spatial operations and queries. These are settings I chose for a relatively new Macbook Pro.
ALTER SYSTEM SET work_mem = '1500MB';
ALTER SYSTEM SET shared_buffers = '4GB';
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;
ALTER SYSTEM SET synchronous_commit = 'off';
ALTER SYSTEM SET fsync = 'off';
ALTER SYSTEM SET random_page_cost = 1.1;
ALTER SYSTEM SET wal_buffers = '16MB';
ALTER SYSTEM SET effective_cache_size = '10GB';
ALTER SYSTEM SET checkpoint_segments = 1600;
SELECT pg_reload_conf();
First, we will load in a seal track from the crawl package we’ll use as our demonstration. While we’re at it, let’s also go ahead and load up our land barrier layer by pulling the Alaska data from the ptolemy package and crop it to our study area. The final sf::st_union() is important because we need land_barrier to be a multipolygon geometry type. Also, note we are transforming everything into *epsg:3338 projection.
data("harborSeal_sf")
harborSeal_sf <- harborSeal_sf %>%
st_transform(3338)
ak <- ptolemy::alaska()
## importGSHHS status:
## --> Pass 1: complete: 10988 bounding boxes within limits.
## --> Pass 2: complete.
## --> Clipping...
land_barrier <- ak %>%
st_crop(harborSeal_sf %>% st_transform(3338)) %>%
sf::st_collection_extract('POLYGON') %>%
sf::st_cast('POLYGON') %>%
sf::st_union()
The next thing we need to do is connect our R session to the database so we can push up some data and run some commands. You’ll need the RPostgres() package installed. If you chose a different name for your database you’ll want to include it as an argument to the dbConnect() function (e.g. dbname = 'my_db')
con <- dbConnect(RPostgres::Postgres())
And, we’ll push up our land_barrier and harborSeal_sf objects. Since these are sf objects, the sf::st_write() function allows us to push them to the database while maintaining all of the spatial features.
sf::st_write(land_barrier, con, "land_barrier", overwrite = TRUE)
sf::st_write(harborSeal_sf, con, "harbor_seal", overwrite = TRUE)
Before we move on, let’s go ahead an plot our spatial objects with mapview().
mapview::mapview(list(land_barrier, harborSeal_sf))
Next, we’ll start passing along some SQL statements. These just make sure we drop any tables that might already exist. Be careful here. You’ll soon learn that creating the vis_graph table can take a very very very very long time. So, don’t inadvertently drop it unless you have the time to re-create it. The last step creates a spatial index for our land_barrier to, hopefully, speed up some of our spatial query operations.
dbExecute(con,"drop table if exists graph_lines")
## [1] 0
dbExecute(con,"drop table if exists vis_graph")
## [1] 0
dbExecute(con,"drop table if exists vis_graph_vertices_pgr")
## [1] 0
dbExecute(con,"drop table if exists land_barrier_buffer")
## [1] 0
dbExecute(con,"drop table if exists coastal_corridor")
## [1] 0
dbExecute(con,"drop table if exists coastal_pts")
## [1] 0
dbExecute(con,
"CREATE INDEX land_barier_gist
ON land_barrier USING GIST (geom)")
## [1] 0
Next, we are going to create a coastline buffer/corridor that will constrain our network graph.
Let’s first buffer our land_barrier table by 3 kilometers
dbExecute(con,
"create table land_barrier_buffer as
select ST_Union(ST_Buffer(land_barrier.geom,3000)) geometry
from land_barrier"
)
## [1] 1
and, make sure we have a primary key established as well as a spatial index
dbExecute(con,
"alter table land_barrier_buffer add column gid serial primary key"
)
## [1] 0
dbExecute(con,
"CREATE INDEX land_barier_buffer_gist
ON land_barrier_buffer USING GIST ( geometry )")
## [1] 0
now, we create the coastal corridor where we will focus our network
dbExecute(con,
"create table coastal_corridor as
select ST_Difference(land_barrier_buffer.geometry,land_barrier.geom) geometry
from land_barrier, land_barrier_buffer"
)
## [1] 1
dbExecute(con,
"CREATE INDEX coastal_corridor_gist
ON coastal_corridor USING GIST ( geometry )"
)
## [1] 0
and, then, we’ll create a grid of points within the coastal corridor at 1km spacing. You can choose a different grid resolution for your study area. The main cost is that more grid points means a larger network which adds to your initial network creation time and, later, the time it takes to find the shortest path.
dbExecute(con,
"create table coastal_pts as
select (ST_PixelAsCentroids(ST_AsRaster(geometry,1000.0,1000.0))).geom geometry
from coastal_corridor"
)
## [1] 13865
dbExecute(con,
"alter table coastal_pts add column gid serial primary key"
)
## [1] 0
dbExecute(con,
"CREATE INDEX coastal_pts_gist
ON coastal_pts USING GIST ( geometry )")
## [1] 0
Now, we are ready to start building our visiblity graph. First, we’ll create lines from every coastal point to every other coastal point. And, then, we’ll select out only those lines that fall completely within the coastal corridor. This is the key step that insures our network only includes routes that pass through water. This is also the step that takes the longest (in this example, ~4 hours).
tic("total")
tic("makeline")
dbExecute(con,
"create table graph_lines as
with lines as (
select id, source, target, ll.line_geom the_geom from
(select
row_number() over () as id,
p1.gid as source,
p2.gid as target,
st_makeline(p1.geometry, p2.geometry)::geometry(LINESTRING) as line_geom
from coastal_pts p1, coastal_pts p2 where p1.gid < p2.gid) ll
)
select lines.id, lines.source, lines.target, lines.the_geom
from lines, land_barrier
where not st_intersects(lines.the_geom, land_barrier.geom)
and st_within(lines.the_geom, st_envelope(land_barrier.geom))")
## [1] 4647096
toc()
## makeline: 3671.981 sec elapsed
tic("st_within")
dbExecute(con,
"create table vis_graph as
select g.id, g.source, g.target, g.the_geom
from coastal_corridor, graph_lines g
where st_within(g.the_geom, coastal_corridor.geometry)")
## [1] 445302
toc()
## st_within: 49.448 sec elapsed
toc()
## total: 3721.431 sec elapsed
dbExecute(con,
"alter table vis_graph add primary key (id)"
)
## [1] 0
let’s take a quick moment and view a sampling of these network lines (edges) within our study area. to save time, we will plot only 10% of the lines.
mapview::mapview(sf::st_read(con,
query = "select * from vis_graph tablesample system (10)"))